Article

Topics in 0--1 data

Authors:
Ella Bingham

Helsinki University of Technology, FIN-02015 HUT, Finland

Helsinki University of Technology, FIN-02015 HUT, Finland
View Profile

,
Heikki Mannila

Helsinki University of Technology, FIN-02015 HUT, Finland

Helsinki University of Technology, FIN-02015 HUT, Finland
View Profile

,
Jouni K. Seppänen

Helsinki University of Technology, FIN-02015 HUT, Finland

Helsinki University of Technology, FIN-02015 HUT, Finland
View Profile

KDD '02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data miningJuly 2002Pages 450–455https://doi.org/10.1145/775047.775112

Published:23 July 2002Publication History

KDD '02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining

Pages 450–455

ABSTRACT

Large 0--1 datasets arise in various applications, such as market basket analysis and information retrieval. We concentrate on the study of topic models, aiming at results which indicate why certain methods succeed or fail. We describe simple algorithms for finding topic models from 0--1 data. We give theoretical results showing that the algorithms can discover the epsilon-separable topic models of Papadimitriou et al. We present empirical results showing that the algorithms find natural topics in real-world data sets. We also briefly discuss the connections to matrix approaches, including nonnegative matrix factorization and independent component analysis.

References

R. Agrawal, T. Imielinski, and A. Swami. Mining association rules between sets of items in large databases. In SIGMOD '93, pages 207--216, 1993. Google ScholarDigital Library
R. Agrawal, H. Mannila, R. Srikant, H. Toivonen, and A. I. Verkamo. Fast discovery of association rules. In U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, editors, Advances in Knowledge Discovery and Data Mining, chapter 12, pages 307--328. AAAI Press, 1996. Google ScholarDigital Library
A. L. Berger, S. A. Della Pietra, and V. J. Della Pietra. A maximum entropy approach to natural language processing. Computational Linguistics, 22(1):39--71, 1996. Google ScholarDigital Library
I. V. Cadez, P. Smyth, and H. Mannila. Probabilistic modeling of transaction data with applications to profiling, visualization, and prediction. In KDD 2001, pages 37--46, San Fransisco, CA, Aug. 2001. Google ScholarDigital Library
M. A. Carreira-Perpinan and S. Renals. Practical identifiability of finite mixtures of multivariate Bernoulli distributions. Neural Computation, 12:141--152, 2000. Google ScholarDigital Library
P. Comon. Independent component analysis --- a new concept? Signal Processing, 36:287--314, 1994. Google ScholarDigital Library
G. Das, H. Mannila, and P. Ronkainen. Similarity of attributes by external probes. In Knowledge Discovery and Data Mining, pages 23--29, 1998.Google Scholar
S. C. Deerwester, S. T. Dumais, T. K. Landauer, G. W. Furnas, and R. A. Harshman. Indexing by latent semantic analysis. Journal of the American Society of Information Science, 41(6):391--407, 1990.Google ScholarCross Ref
S. Della Pietra, V. J. Della Pietra, and J. D. Lafferty. Inducing features of random fields. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(4):380--393, 1997. Google ScholarDigital Library
M. Gyllenberg, T. Koski, E. Reilink, and M. Verlaan. Non-uniqueness in probabilistic numerical identification of bacteria. Journal of Applied Probability, 31:542--548, 1994.Google ScholarCross Ref
T. Hofmann. Probabilistic latent semantic indexing. In SIGIR '99, pages 50--57, Berkeley, CA, 1999. Google ScholarDigital Library
A. Hyvärinen, J. Karhunen, and E. Oja. Independent Component Analysis. John Wiley & Sons, 2001.Google Scholar
D. D. Lee and H. S. Seung. Learning the parts of objects by non-negative matrix factorization. Nature, 401:788--791, Oct. 1999.Google ScholarCross Ref
C. H. Papadimitriou, P. Raghavan, H. Tamaki, and S. Vempala. Latent semantic indexing: A probabilistic analysis. In PODS '98, pages 159--168, June 1998. Google ScholarDigital Library
D. Pavlov, H. Mannila, and P. Smyth. Probabilistic models for query approximation with large sparse binary datasets. In UAI-2000, 2000. Google ScholarDigital Library
D. Pavlov and P. Smyth. Probabilistic query models for transaction data. In KDD 2001, 2001. Google ScholarDigital Library
J. W. Sammon. A nonlinear mapping for data structure analysis. IEEE Transactions on Computers, 18(5):401--409, May 1969.Google ScholarDigital Library

Index Terms

Topics in 0--1 data
1. Information systems
  1. Information systems applications
    1. Data mining
2. Mathematics of computing
  1. Probability and statistics

Recommendations

Mining causal topics in text data: iterative topic modeling with time series feedback
CIKM '13: Proceedings of the 22nd ACM international conference on Information & Knowledge Management

Many applications require analyzing textual topics in conjunction with external time series variables such as stock prices. We develop a novel general text mining framework for discovering such causal topics from text. Our framework naturally combines ...
Read More
Detecting bursts in sentiment-aware topics from social media

Nowadays plenty of user-generated posts, e.g., sina weibos, are published on the social media. The posts contain the publics sentiments (i.e., positive or negative) towards various topics. Bursty sentiment-aware topics from these posts reveal sentiment-...
Read More
Sentiment analysis with global topics and local dependency
AAAI'10: Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence

With the development of Web 2.0, sentiment analysis has now become a popular research problem to tackle. Recently, topic models have been introduced for the simultaneous analysis for topics and the sentiment in a document. These studies, which jointly ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
July 2002
719 pages
ISBN:158113567X
DOI:10.1145/775047
Conference Chair:
Osmar R. Zaïane
University of Alberta, Canada
,
General Chair:
Randy Goebel
University of Alberta, Canada
,
Program Chairs:
David Hand
Imperial College, UK
,
Daniel Keim
AT&T
,
Raymond Ng
University of British Columbia, Canada
Copyright © 2002 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 23 July 2002
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- Article
Conference

Acceptance Rates
KDD '02 Paper Acceptance Rate44of307submissions,14%Overall Acceptance Rate1,133of8,635submissions,13%
More
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 3
  Total Citations
  View Citations
- 384
  Total Downloads
- Downloads (Last 12 months)1
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Topics in 0--1 data

KDD '02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining

ABSTRACT

References

Cited By

Index Terms

Recommendations

Mining causal topics in text data: iterative topic modeling with time series feedback

Detecting bursts in sentiment-aware topics from social media

Sentiment analysis with global topics and local dependency